Improved Induction Tree Training for Automatic Lexical Categorization

نویسندگان

M. Daniela López De Luise

M. Soffer

F. Spelanzon

چکیده

This paper studies a tuned version of an induction tree which is used for automatic detection of lexical word category. The database used to train the tree has several fields to describe Spanish words morpho-syntactically. All the processing is performed using only the information of the word and its actual sentence. It will be shown here that this kind of induction is good enough to perform the linguistic categorization. Index Term — Machine Learning, lexical categorization, morphology, syntactic analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating a Lexical Database and a Training Collection for Text Categorization

Automatic text categorization is a complex and useful task for many natural language processing applications. Recent approaches to text categorization focus more on algorithms than on resources involved in this operation. In contrast to this trend, we present an approach based on the integration of widely available resources as lexical databases and training collections to overcome current limi...

متن کامل

That's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets

We propose a novel data augmentation approach to enhance computational behavioral analysis using social media text. In particular, we collect a Twitter corpus of the descriptions of annoying behaviors using the #petpeeve hashtags. In the qualitative analysis, we study the language use in these tweets, with a special focus on the fine-grained categories and the geographic variation of the langua...

متن کامل

Spanish/English Cross-Lingual Categorization

This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which arises when documents in different languages must be classified according to the same classification tree. We describe practical and cost-effective solutions for automatic Cross-Lingual Text Categorization, both in case a sufficient number of training examples is available for each new language and in the cas...

متن کامل

Cross-Lingual Text Categorization

متن کامل

Using WordNet to Complement Training Information in Text Categorization

Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Improved Induction Tree Training for Automatic Lexical Categorization

نویسندگان

چکیده

منابع مشابه

Integrating a Lexical Database and a Training Collection for Text Categorization

That's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets

Spanish/English Cross-Lingual Categorization

Cross-Lingual Text Categorization

Using WordNet to Complement Training Information in Text Categorization

عنوان ژورنال:

اشتراک گذاری